Chore: BufferTransformer and Blowfish Refactor #4151

reneme · 2024-06-26T15:16:16Z

This picks up the work started in #3942 by proposing a BufferTransformer class; which is essentially an orchestrator for BufferSlicer and BufferStuffer. As mentioned yesterday, I added some template fanciness, to handle the "block processing" necessary for various block ciphers. As suggested previously, this uses Blowfish to demonstrate the functionality (+ spanifying the blowfish implementation).

@randombit Is the process_blocks_of<> implementation in line with what you had in mind previously:

I wonder if we can fit in some other common functionality there, namely the word level loads/stores. That's a very common task (encrypt+decrypt for N ciphers plus alternative implementations)

Here's the idea of BufferTransformer::process_blocks_of, where the multi-block loop handling and buffer management is entirely abstracted. Also, as an additional goodie, we now have explicit compile-time knowledge of the blocked data length inside the processing lambdas. Whether the compiler uses this or not has to be seen. FWIW: I don't see a difference in the performance of Blowfish on my Mac.

void encrypt_n(const uint8_t inb[], uint8_t outb[], size_t blocks) {
   std::span<const uint8_t> in(inb, inb + blocks*BS);
   std::span<uint8_t> out(outb, outb + blocks*BS);
   
   BufferTransformer bt(in, out);
   bt.process_blocks_of<BS*4, BS>(
      overloaded{
         [](std::span<const uint8_t, BS*4> in, std::span<uint8_t, BS*4> out) {
            // encrypt four blocks at a time
         },
         [](std::span<const uint8_t, BS> in, std::span<uint8_t, BS> out) {
            // encrypt a single block
         }
      }
   );
}

No hard feelings if that doesn't make it into 3.5.0 at all. I just nerd-sniped myself yesterday and wanted to sketch this out. And frankly, I think the BufferTransformer implementation is quite concise. 😄

coveralls · 2024-06-27T07:58:38Z

coverage: 91.742% (+0.02%) from 91.72%
when pulling 7b3c762 on Rohde-Schwarz:rene/buffer_transformer
into 66216e1 on randombit:master.

reneme · 2024-08-05T12:45:29Z

As suggested, I also applied this to the AES-NI-128 encryption path to verify that BufferTransformer boils down to a tight loop without significant overhead. I didn't add the patch to the PR as I didn't transform all encrypt/decrypt code paths for this test.

See the resulting code here (clang 18): #4287 (comment)

... and the corresponding C++ code:

BufferTransformer(std::span{in, blocks * BLOCK_SIZE}, std::span{out, blocks * BLOCK_SIZE})
    .process_blocks_of<4 * BLOCK_SIZE, BLOCK_SIZE>(overloaded{
        [&](std::span<const uint8_t, 4 * BLOCK_SIZE> in, std::span<uint8_t, 4 * BLOCK_SIZE> out) {
            SIMD_4x32 B0 = SIMD_4x32::load_le(in.data() + 16 * 0);
            SIMD_4x32 B1 = SIMD_4x32::load_le(in.data() + 16 * 1);
            SIMD_4x32 B2 = SIMD_4x32::load_le(in.data() + 16 * 2);
            SIMD_4x32 B3 = SIMD_4x32::load_le(in.data() + 16 * 3);

            keyxor(K0, B0, B1, B2, B3);
            aesenc(K1, B0, B1, B2, B3);
            aesenc(K2, B0, B1, B2, B3);
            aesenc(K3, B0, B1, B2, B3);
            aesenc(K4, B0, B1, B2, B3);
            aesenc(K5, B0, B1, B2, B3);
            aesenc(K6, B0, B1, B2, B3);
            aesenc(K7, B0, B1, B2, B3);
            aesenc(K8, B0, B1, B2, B3);
            aesenc(K9, B0, B1, B2, B3);
            aesenclast(K10, B0, B1, B2, B3);

            B0.store_le(out.data() + 16 * 0);
            B1.store_le(out.data() + 16 * 1);
            B2.store_le(out.data() + 16 * 2);
            B3.store_le(out.data() + 16 * 3);
            },
        [&](std::span<const uint8_t, BLOCK_SIZE> in, std::span<uint8_t, BLOCK_SIZE> out) {
            SIMD_4x32 B0 = SIMD_4x32::load_le(in.data());

            B0 ^= K0;
            aesenc(K1, B0);
            aesenc(K2, B0);
            aesenc(K3, B0);
            aesenc(K4, B0);
            aesenc(K5, B0);
            aesenc(K6, B0);
            aesenc(K7, B0);
            aesenc(K8, B0);
            aesenc(K9, B0);
            aesenclast(K10, B0);

            B0.store_le(out.data());
        },
    });
}

coveralls · 2024-08-05T13:19:18Z

coverage: 91.37% (+0.03%) from 91.344%
when pulling c88455d on Rohde-Schwarz:rene/buffer_transformer
into 13edb92 on randombit:master.

Introduce BufferTransformer

f3a63dd

reneme added the enhancement Enhancement or new feature label Jun 26, 2024

reneme requested a review from randombit June 26, 2024 15:16

reneme force-pushed the rene/buffer_transformer branch 2 times, most recently from 4feb67d to 87d4035 Compare June 27, 2024 07:15

reneme mentioned this pull request Jun 27, 2024

Chore: BufferTransformer #3942

Closed

reneme force-pushed the rene/buffer_transformer branch from 87d4035 to 7b3c762 Compare June 27, 2024 07:28

reneme mentioned this pull request Jul 1, 2024

[std::span] Pass std::span buffers into Block_Cipher Implementations #3870

Closed

reneme mentioned this pull request Jul 10, 2024

Add EC_Scalar and EC_AffinePoint types #4042

Merged

reneme mentioned this pull request Aug 5, 2024

Add support for AVX2-VAES #4287

Merged

reneme added 3 commits August 5, 2024 14:48

generalize BufferTransformer for block-wise processing

df20528

use BufferTransformer in blowfish

8247124

Chore: refactor blowfish using std::span<> where sensible

c88455d

reneme force-pushed the rene/buffer_transformer branch from 7b3c762 to c88455d Compare August 5, 2024 12:48

reneme mentioned this pull request Aug 5, 2024

Load/Store for SIMD type wrappers #4288

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chore: BufferTransformer and Blowfish Refactor #4151

Chore: BufferTransformer and Blowfish Refactor #4151

reneme commented Jun 26, 2024 •

edited

Loading

coveralls commented Jun 27, 2024

reneme commented Aug 5, 2024 •

edited

Loading

coveralls commented Aug 5, 2024

Chore: BufferTransformer and Blowfish Refactor #4151

Are you sure you want to change the base?

Chore: BufferTransformer and Blowfish Refactor #4151

Conversation

reneme commented Jun 26, 2024 • edited Loading

coveralls commented Jun 27, 2024

reneme commented Aug 5, 2024 • edited Loading

coveralls commented Aug 5, 2024

reneme commented Jun 26, 2024 •

edited

Loading

reneme commented Aug 5, 2024 •

edited

Loading